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^ . Abstract 

• I A wavelet dictionary comprising a set of Gabor functions is learned from 

O ■ natural images using a sparse image code. Gabor functions describe the re- 

ceptive field properties of simple cells in the primary visual cortex, and form 
a basis for efficiently coding natural image patches. Each Gabor function is 
^^ ! completely specified by five parameters, and each parameter is treated as a 

CN I random variable. I derive a learning rule and use it to learn a joint probabil- 

ity distribution over the Gabor parameters. Gabor parameters corresponding 
to receptive-field size and spatial frequency are found to be strongly corre- 



O ■ lated, and Pareto distributed: revealing that Gabor functions in the dictio- 

^^ ' nary are scale invariant over a range of length scales. Parametric models 

of the joint probability distribution are estimated. Synthetic dictionaries 
sampled from these parametric models are shown to perform well on image- 
rs ' patch reconstruction, and highlight the importance of taking into account 
c^ . Gabor-parameter correlations. This approach generalizes uniform sampling 
of wavelet parameters, to sampling of wavelet parameters from non-uniform 
distributions learned from natural image data - thereby adapting wavelets 
to the statistics of the data. 

1 Introduction 

We know that simple cells in the primary visual cortex have spatially-localized 
receptive fields, and are tuned to stimulus fe atures such as o rientation, spa- 



tial frequency, and location in the visual field ( ISwindald . Il996[ ) . These simple 

1 



cells are the final stage of a mapping of stimulus features from the visual 
field to a retinotopic position on the surface of the primary visual cortex 



( Durbin and Mitchisonl . Il990l ). Various models have been proposed to ex- 



plain how neural activity and visual experience cou ld lead to development 
and organization of a cortical map (jSwindald . Il996l ). For example, neural 
network models were used to describe the self-organization of orientation se- 
lectivi ty through a Hebbian le arning rule with localized, oriented input pat- 
terns (jvon der Malsburgi . Il973l ) , o r through a sym metry-breaking mechanism 
with uncorrelated random input (JLinskerl . Il986l ) . 

In addition to understanding the development and organization of a cor- 
tical map, it would be extremely useful to understand its function. The 
primary visual cortex represents an important stage in processing informa- 
tion along the visual pathway. Models capable of efficient coding of natural 
sensory data have been shown to develop realisti c neural receptive fields that 
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are adapted to the statist i cs of their input data (lOlshausen and Field 
19971 : iBell and Sejnowskil . 119971 ). In this class of models, input consists of 
natural image patches - i.e., natural sensory data at the level of photorecep- 
tors in the retina, and a Hebbian-like learning rule implementing an efficient 
coding strategy results in the adaptation of a set of basis functions analogous 
to the self-organization of orientation selectivity. The set of basis functions 
(called a dictionary) can be used to efficiently code and reconstruct natural 
image patches. 

The output of an efficient coding model is an image code with fewer sta- 
tistical dependencies than found between pixels in the input image. During 
the process of image formation light refle cted f r om o ne or more physical 
objects is projected onto an image plane ( iHornl . Il986l ). so that particular 
combinations of image pixels (the pixel correlations) contain a lot of infor- 
mation about the underlying objects in a compli cated way that depends on 
lighting, occlusion, viewpoint, and other effects (jUUmaru . Il996l ). Assuming 
different image sources to be largely independent of each other, the image 
code attempts to uncover t he "causes" of an image in terms of statistically 



independent image sources ( jBell and Sejnowskil . 119971 ). These models work 
by making use of the high-order statistics of natural images, either through 
the as sumed form of a prior distribution over output comp onents of the image 
code (lOlshausen and Fieldl . 119971 : iHyvarinen et. al.l . 12009), or through the as- 
sume d form of a nonlinear neural input-output function (JBell and Sejnowskil . 
1995h . 

A more traditional approach to efficient coding of image data is wavelet 



analysis. Receptive fields of individual simple cells are well- described by the 
so-called Gabor function (Marcelja, 1980; also see references in Field, 1999), 
which is a function that has the property of minimum uncertainty for the 
simultaneous co-localization of a signal in space and in spatial frequency. 
Two-dimensional (2D) Gabor tra nsforms were first used for image analysis 
and compression (JDaugmaru . Il988l ) , and for an early demonstration of a sparse 
image code that reduces entropy and decorrelates image pixels (JDaugmanl . 
19891 ). Lee later derived conditions for when a set of non-orthogonal 2D 
Gabor wavelets provides a comple te im age representation by generalizing 
the frame criterion for ID wavelets ( jLed . Il996). Application of this criterion 
led to the Gabor parameter quanti zations (or sampling rates) necessary for 
generating a complete image code (JLed . 119961 ). 

Effic ient coding i s possible due to the s tatistical properties of natural 



images fJFieldl . Il994j : iHyvarinen et. al.l . |2009| ). Strong pixel correlations in 
natural images lead to a high degree of redundancy: the distribution of 
natur al images in pixel-space is non-uniform and concentrated in a small 
area (JFieldl . Il994j ) , while random images are uniformly distributed in pixel- 
space. An approximate 1// Fourier amplitude spectrum (or 1//^ power spec- 
trum) completely describes the second-order statistics of pixel correlations 
in natural images. These correlations can be r emoved using either pri ncipal 
component analysis (PCA), or filter methods ( Hyvarinen et. al.l . l2009l ). The 



Fourier phase spectrum c onvey s most of the image content and the higher- 



order pixel correlations (JFieldl . Il994j ). Some of these correlations can be 



removed by seeking a linear tra nsformatiq i i that generates an imag e code 



with sparse output components (JDaugmanl. Il989l: iBell and Sejnowskil . 11995 



Olshausen and Field! . 119971 : IHyvarinen et. al.l . l2009l ). In this case, each out 



put component comes from a low-entropy marginal distribution, and there- 
fore reduces the mutual information (the joint entropy remains approximately 
unchanged, while the individual entropies are minimized). Reducing the mu- 
tual information leads to output components that are closer to being statisti- 
cally independent. Further p ixel correlations can be removed with non-linear 
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extensions of these models (iHvvarinen and Hoyer 
200ll : iKoster and Hvvarinenl . l2010h " 

For a learned dictionary to provide a useful basis for describing natural 
image patches, it must itself represent or refiect the statistics of natural 
images at some level. A natural question is: Can we extract this statistics 
in a meaningful way? Can we understand natural image statistics from the 
statistics of a dictionary; rather than from the statistics of image pixels, or 



an image code? 

In this Article, I attempt to address these questions by learning wavelet 
correlations and statistics for a dictionary of Gabor functions. Each Gabor 
function is completely specified by five parameters, and each parameter is 
treated as a random variable. The dictionary statistics is given by a learned 
joint probability distribution over the Gabor parameters. I then estimate 
parametric models for this learned joint distribution, and determine how 
well dictionaries generated from these parametric models perform co mpared 



to the learned dictionary. This approach generalizes that of lLed ( 119961 ): Here, 
Gabor parameters are sampled from non-uniform distributions learned from 
data, while Lee used a uniform sampling scheme. 

The structure of this Article is as follows: In Section 2, the parametric 
Gabor dictionary is constructed, and a learning rule for the Gabor parameters 
is derived. In Section 3, this learning rule is used to learn the joint probability 
distribution of Gabor parameters, and statistical dependencies between pairs 
of Gabor parameters are characterized. Parametric models of the learned 
joint distribution are estimated in Sections 3 and 4, and synthetic dictionaries 
are then constructed and compared in Section 4 by sampling from these 
parametric models. 

2 Model 

The image model I apply is a linear generative model. Using the vector 
r = {x,y) to label the discrete pixel coordinates of image /(r), each image 
is assumed to be generated by a sum of basis functions g{r, r'): one for each 
non-zero basis coefficient a(r'), and Gaussian noise A^(r), as 

/(r) = 5^^(r,r')a(r') + Ar(r). (1) 

r' 

In order to investigate dictionary statistics I choose a parametric form for 



the d i ctionary g . Motivated by earlier work ( jMarceljal . Il980l : iDaugmanl . Il988l . 



19891 : iLed . Il996l ). I start with the Gabor function: 



G(r) = exp ( ^\/''' ) cos (2^^ + ^ ) , (2) 



with 

(x,y) = i ^ ^ , . 

siiKp cos(p I \ y 



. .. , COS0 -sm0 \ / X > , . 

^^y)= .;^A ...^ ., ' (3) 



giving a Gaussian- windowed sinusoid with five parameters (0, cr, 7, A, </)). 
From the perspective of neuroscience, each of these parameters tells us some- 
thing about the receptive field properties of simple cells in the primary vi- 
sual cortex. A parameter value of 7 7^ 1 means a simple cell responds more 
strongly to a visual stimulus given by a bar or an edge instead of a circu- 
lar spot. In this case, the optimal stimulus response is given for a stimulus 
orientation cf). The parameters a and A correspond to how well localized a 
receptive field is in space, and in spatial frequency 1/A, respectively. The ip 
parameter corresponds to the simple-cell phase response. 

To learn values for the Gabor parameters from natural images, I construct 
a parametric dictionary by generalizing the Gabor function in a straightfor- 
ward way as 



^ir,r. 



exp 



-y' 



7(r')^a;^ 



2a(r 



'^2 



COS I 27r 



y 
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+ v^(r'; 
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(5,y) 



cos I 
sin< 



— sm(/)(^: 
cos (/)('r 



X — X 

y-y' 



(4) 



(5) 



Gabor functions uniformly tile space in this construction: at each discrete 
spatial position r there is a Gabor function centered at r = r' with a unique 
set of parameters 0(r'), cr(r'), 7(r'), A(r'), and ^ijc'). Although it is the basis 
functions that are chosen to have a Gabor form, receptive fields (given by 
the responses of each a(r) to inputs consi sting of spots at eve r y pos ition) 
are similar in form to the basis functions ( lOlshausen and Fieldl . 119961 ): and 
have essentially the sarn e orientation, location, and spatial frequency tuning 



f lHvvarinen et. all. l2009f) 



Following lOlshausen and Fieldl (1 19971 ). and lLewicki and Olshauseru (jl999f ). 
the probability of generating a particular image / is assumed to be given by 
a latent variable model of the form: 



p{i\e) 



daP{I\e,a)P{a) 



(6) 



where a are the unobserved (latent) variables, and 0{r) = (0(r), o"(r), 7(r), A(r) 
is a vector of the five Gabor parameters. For the case of Gaussian noise with 
variance a': P{I\0,a) oc Hr^^P (~^(^)^/2^' ); while the marginal distribu- 
tion for a is assumed to be sparse and to factor: P{a) oc Hr^^P (~/^'S'(a(r))). 
Usual choices for P{a) include the Cauchy distribution, i.e. S{x) = log (1 + x^); 



^W) 



the "logistic" distributi on, i.e. S(x) 
bution, i.e. S{x) = \x\ (JHyvarinen et. al. 



log (cosh x); or the Laplacian distri- 
20091 ). In the present work, the 



Cauchy distribution is chosen. 

Estimating parameter s in a latent v ariable model can be done efficiently 
using the EM algorithm (JBishopl . |2006| ). The E-step begins by inferring the 
latent variables a, given 6 and /. Using Bayes' rule, P{a\I, 0) can be written 
as 

P{a\I,e)(xP{I\e,a)P{a). (7) 

Since the expectation over P(a|/, 6) cannot be evaluated analytically, a good 
estimate of the latent variables must be found. One approach (known as 
is to use the Maximum Posterior (MAP) estimate for a. Upon defining 
E = —\og[P{I\6,a)P{a)], and using Eq. ([7]) with the presented forms for 
P{I\6,a) and P{a), this can be written as 



a = arg max P(a|/, 0), 

a 

= arg min E, 



(8) 
(9) 



where 



E 



j'^i 



E 



IM 



E^i 



r, r a r 



+ A5(a(r)) 



(10) 



with A = 0" (3. Finding the MAP value for a therefore reduces to simulta- 
neously minimizing the least-squares error and sparseness terms in Eq. (ITO|) . 
For th e Cauchy distribution, this has be en done using conjugate gradient de- 
scent (lOlshausen and Fieldl . ll996Ul997l ). For the Laplacian distribution, this 
is known as basis pursuit denoisi nq; and many fast and effici ent algorithms 
have recently been developed (see iGoldstein and Osherl . l2009l . for example). 
The M-step involves maximizing the average log-likelihood {log P{I\6)) 
with respect to the Gabor parameters 6. The average log-likelihood is the 
likelihood function given by Eq. ([6]), averaged over a batch of image patches. 
Maximizing this quantity is equivalent to minimizing the KuUback-Leibler 
divergence between the distribution of images in nature, and the d i stribu - 
tion of images generated from the image model ( lOlshausen and Field! . 119971 ). 



Maximizing the log-likelihood is implemented using gradient ascent: 

A^.(r) = r/.^(logP(/|0)), (11) 

1 d 



P(,|,)a,.„i:^('l».«W«)). (12) 

".(pi]i^i:^(^i«.<'W«)(-||))). (13) 
mE^(«i«.^)|^). (") 



dE 



(15) 



where t^j are the learning rates, and E is given by Eq. (TTOj) . The gradient can 
be written as 

where the residual error r(r) is defined as 

r{v) = I{v)-Y,g{vy)a{v'). (17) 



Using these two expressions in Eq. f fTSj) leads to 



^^^w = ^^E^CTir (^^^ 



where the partial derivatives for g are given in Append. A. Updating each 
Gabor parameter therefore requires the calculation of two expectations. The 
inner expectation in Eq. (1181) is with respect to the posterior distribution 
P{a\6,I) given by Eq. ([7]), and is evaluated in the E-step. The outer ex- 
pectation is an average over a batch of image patches. Evaluating both 
expectations, then adjusting each Gabor parameter according to Eq. flTSjl . 
constitutes the M-step. The EM algorithm consists of alternat i ng be tween 



the E-step and the M-step until convergence is reached (JBishopl . l2006l ). 



The approach outhned here allows modeling of Gaussian noise and an 
overcomplete basis. If the noise level is zero and the basis is complete, the 
E-step can be avoided, and the ICA learning rule follows. In this case, 
Eq. ([1]) can be inverted to give a(r) = '^^, g{r, r')~^J(r'), and the distribution 
P{I\6, a) in Eq. ([6]) becomes a delta-function. Performing the sum over a in 
Eq. ([H]) then yields P{I\6) = P{g^^I). Maxi mizing this likelihood fo rms the 



basis of the FastICA algorithm discussed in iHyvarinen et. al.l ( 120091 ) 



In the presence of Gaussian noise or for an overcomplete basis, the E-step 
is usually performed either by sampling from P{a\6, 1), or by using its MAP 
estimate from Eq. ([8]). For the case of the MAP approach, the learning rule 
in Eq. (ITH]) becomes 



Ae,{r)=v,Yl 



dg' 



r, r 



de,{r) 



(d(r)f(r')) 



(19) 



where f is the residual error from Eq. ( IT71) with a instead of a, and a is the 
MAP estimate given by Eq. (|8l). This l earning rule turns out to be similar in 
form to that of lOlshausen et. all (120011 ) . except the dictionary elements in g 
are now specialized to Gabor functions. The drawback of using MAP instead 
of sampling in the E-step is encountering a trivial solution given when both 
terms in Eq. ( ITOl) are minimized by a small v alue of a, and a large value 
of g; such that ga ~ /. To avoid this solution, lOlshausen and Fieldl (119971 1 
introduced a learning rule for the L2-norm of the basis functions. Here, I 
define 

L,{v) = v/^W^, (20) 

and avoid the trivial solution by supplementing Eq. ( TT9l) with the learning 
rule: 

L2(r)-- = L2(r)°'^ ^^^ , (21) 



O", 



goal 

to update the a{r) values. 

3 Results 

Now I apply the learning rules given by Eqs. (IT9l) - (l2Tl) to a set of image 
patches of size 16 x 16 pixels (A^ = 256 pixels). A total of 10^ image patches 
in batches of 100 were randomly selected from 10 images of natural scenes 
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Figure 1: Initial dictionary of 256 Gabor basis functions. The parameters 
and if were chosen at random from uniform distributions over and vr 
(for 0), and — vr and vr (for ip). Other parameters were chosen as: 7 = 1; 
a = 0.5^/N; and A = O.b^/N (see text). 



taken from ISparseneu The image pre-proces sing includes both wh i tenin g 
and dimensionality reduction, as described in lOlshausen and Fieldl ( 119971 ). 
The MAP approach was used in the E-step with the Cauchy distribution 
and conjugate gradient descent in order to facilitate a close comparison with 
Olshausen's and Field's non-parametric dictionary. 

The first objective is to demonstrate that the learning rules given by 
Eqs. (I19p -( 1?T]) lead to a set of basi s functions that are oriente d, localized, 
and bandpass as originally shown in lOlshausen and Fieldl (Il996[ ) . To demon- 
strate this, I start with the opposite case given by the initial dictionary in 
Fig. 1: each basis function is circularly symmetric, global in space (covering 
a whole image patch), and global in wavenumber space. The learned dictio- 
nary in Fig. 2 clearly shows the set of basis functions have become oriented, 
locahzed, and bandpass (see also Fig. 4). This dictionary is approximately 
1.4-times over-complete: its singular values drop off sharply after about 180 
dimensions. 

To show that the learned parametric dictionary is approximately as ef- 
fective as a non-parametric dictionary learned using \Sparsenen . I compare 
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Figure 2: Learned dictionary of 256 Gabor basis functions. Statistics of the 
Gabor parameters are shown in Figs. 4 and 5. 

image-patch reconstruction performance. For each of the dictionaries I com- 
pute the MAP value of a using Eqs. (^ and ( ITU]) for a test image patch chosen 
at random. I then find the patch reconstruction error given by ||-f — fl'ap- I 
repeat this procedure for 300 test patches and take the average. This entire 
procedure is then repeated for different A values to determine image-patch 
reconstruction performance at different sparseness levels. The mean recon- 
struction error at different sparseness levels is shown in Fig. 3. The sparse- 
ness norm for a single patch = J2r '^'(^(i"))/ J2r '^'(-^(r)), and has a small value 
when A is large and a is sparse. Averaging over all patches gives the mean 
sparseness norm plotted in Fig. 3. This figure shows that the learned dictio- 
nary performs significantly better than the initial dictionary, and has similar 
performance to the non-parametric dictionary. It also shows reconstruction 
performance degrading with increasing sparseness of a (i.e., decreasing mean 
sparseness norm), as expected. 

Having established these benchmark results, the main aim of this work 
is to investigate and model the joint probability distribution of the learned 
Gabor parameters. Histograms of the learned Gabor parameters are shown 
in Fig. 4. I fitted continuous distributions to this data which are shown as 
the solid lines in each of the histogram plots. I removed 5 outliers from the 
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Figure 3: Mean reconstruction error for a set of 300 test-patch reconstruc- 
tions at different sparseness levels (see text). Results from using the dic- 
tionary of initial values from Fig. 1 (Asterisks) , the learned dictionary from 
Fig. 2 (Crosse s), and a d ictionary of non-parametric basis functions (Circles) 
learned nsmg lSparsenet . 



256 data points in order to generate these plots. The fitted distributions 
and their parameter estimates are given in Table 1; parameters with hats 
correspond to maximum-likelihood estimates. 

The histograms in Fig. 4 are well-fitted by uniform, Pareto, and Gaus- 
sian univariate distributions. Isotropy in natural images is a reasonable first 
approximation ( JHyvarinen et. al.U2009l ). so characterizing the orientation pa- 
rameter with a uniform distribution seems appropriate. Similarly, the phase 
parameter ip appears to be uniformly distributed as in the initial dictionary. 
The aspect-ratio 7 is Gaussian distributed with a mean < 1, indicating a 
significant deviation from the unit aspect ratio of basis functions with cir- 
cular symmetry. The two spatially-depe ndent parame ters a and A are each 
well-described by a Pareto distribution (JMardial . Il962l ) of the form 



/ I tXj H. • tAj y 



-(a+1) 3. > a; 







t/j ^~^ tij 7-1 



(22) 



where x^ > sets the minimum length scale, and the exponent a > 
describes the distribution shape. The power-law form of this distribution 
means it is invariant to a scale transformation: x' = kx. This implies Gabor 
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Figure 4: Histogram and fitted distribution (solid line) for each Gabor pa- 
rameter of the learned dictionary in Fig. 2. To generate these plots 5 outliers 
with a > 2.SyN were removed from 256 data points. 
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Pareto 


a = 1.91 
(T„ = 0.134 
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a = 


- a„«-(v^) 
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Pareto 


a = 2.59 
A^ = 0.203 


^/N 
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= A^u-d/") 


7 


Gaussian 


fL = 0.767 
a = 0.045 




7 = 


= /i + (72; 



Table 1: Univariate probability distribution and parameter estimates used 
to fit each histogram in Fig. 4. The sampler for each distribution is also 
shown: where u and z are drawn from standard uniform, and standard normal 
distributions, respectively. 

functions in the learned dictionary have no characteristic length scale over 
the ranges a^ < cr < (jpatch and A^ < A < Apatch; where apatch and Apatch 
are maximum length scales determined by the patch size. 

The histograms and fitted distributions in Fig. 4 and Table 1 represent 
the marginal distributions of the Gabor parameters. To see statistical depen- 
dencies in the joint probability distribution, I show scatter plots in Fig. 5 for 
each pair of correlated Gabor parameters with p- value < 0.05. It is seen that 
a strong correlation (line of best-fit slope=0.92) between a and A has been 
learned. A weaker correlation (line of best-fit slope=0.18) has been learned 
between a and 7. No other parameter pairs are found to exhibit statistically 
significant correlations. 



4 Synthetic Dictionaries 

Parametric models of the joint probability distribution of learned Gabor pa- 
rameters are now estimated. Th e simplest parametric model is given by the 



graphical model ( iBishopl . l2006l ) shown in Fig. 6(a). In this case no corre- 
lations are modeled, and each parameter is independently sampled from its 
appropriate marginal distribution in Table 1. 

Alternative parametric models can be constructed from the graphical 
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Figure 5: Scatter plot and line of best fit for each pair of correlated Gabor 
parameters with p-value<0.05. (Top) Correlations between a and 7, with line 
of best-fit slope=0.18. (Bottom) Correlations between a and A, with line of 
best-fit slope=0.92. To generate these plots 5 outliers with a > 2.8vN were 
removed from 256 data points. 
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model shown in Fig. 6(b). In this case, I model the correlation in Fig. 5 
(Bottom Panel) as P(A, a) = P{A\a)P{a); where the conditional dependency 
P(A|o") is depicted by the arrow in Fig. 6(b). I approximate this dependency 
using two alternative approaches. The first approach is to use a line-of-best- 
fit model where I sample a from the Pareto distribution V{a\a,am) using 
the sampler in Table 1, then generate a sample of A according to 

A = aa + b, (23) 

where a = 0.92 is slope of the line of best-fit to the data in Fig. 5 (Bottom 
Panel), and b = A^ — aam so that A = A^ when a = a^ in Eq. f l23|) . The 
resulting scatter plot is shown in Fig. 7 (Middle Panel), and models P(A|cr) 
as a delta-function: P{K\a) = 6{A — aa — h). 

T he second approa ch models P(A, a) using the Gaussian copula func- 



tion (JEmbrechtsl . l200ll ). In this case, the dependency is included through a 
bivariate Gaussian A/'(0, S) with 

by choosing the value of p > (in Figs. 7 and 8, I chose p = 0.9). Samples 
(xi, X2) are then drawn from this distribution using x = Az; where A is the 
Cholesky decomposition of S, and z = {zi,Z2) - with zi and Z2 each drawn 
from the standard univariate normal A/'(0, 1). Next, the samples are trans- 
formed to uniform marginal distributions using (^1,^2) = {^{xi),^{x2))', 
where $ is the cumulative distribution function (CDF) for the standard uni- 
variate normal. Finally, samples of A and a are obtained using (A, a) = 
($~^(ui), $"^(^2)); where $~^ is the inverse CDF for the Pareto distribu- 
tions in Table 1. The resulting scatter plot is shown in Fig. 7 (Bottom Panel). 

Synthetic dictionaries with statistical properties similar to those learned 
from natural images can now be generated by drawing samples from these 
parametric models. To find which dictionary better captures the statistical 
properties of natural images, I compare performance on image-patch recon- 
struction at different sparseness levels in Fig. 8. The sparseness norm for a 
single patch = J2r^i^i^))/^r^iH'^))'^ ^^^ is averaged over all patches to 
get the mean sparseness norm, as explained in relation to Fig. 3. Results 
from the initial and learned dictionaries in Figs. 1 and 2 are shown as a 
comparison. 
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Figure 6: Two minimal graphical models of the joint probability distribution 
of learned Gabor parameters, (a) Model with no dependencies between Ga- 
bor parameters, (b) Model with the conditional dependency P{A\a) depicted 
by the arrow. 

A dictionary corresponding to the graphical model in Fig. 6(a) is con- 
structed by sampling the univariate distributions in Table 1. In this case 
there is no dependency between A and a, as shown in Fig. 7 (Top Panel). In 
Fig. 8, it is seen that this uncorrelated dictionary leads to a significant in- 
crease in performance over the initial dictionary. The difference is the result 
of sampling the Gabor parameters a, 7, and A from non-uniform distribu- 
tions learned from natural image data. 

Dictionaries corresponding to the graphical model in Fig. 6(b) include a 
correlation between A and a. Sampling 0, ip, a, and 7 from the univariate 
distributions in Table 1, and using Eq. f l23|l to generate A, leads to a line- 
of-best-fit (LBF) correlated dictionary. The modeled dependency between A 
and a is shown in Fig. 7 (Middle Panel). Sampling A and a directly from 
the Gaussian copula leads to a copula-correlated dictionary, and the mod- 
eled dependency between A and a is shown in Fig. 7 (Bottom Panel). In 
Fig. 8, it is seen that including the correlation between A and a leads to a 
significant increase in performance over the uncorrelated dictionary; suggest- 
ing this correlation is crucial for generating good synthetic dictionaries. The 
Gaussian copula model does slightly worse than the line-of-best-fit model - 
despite making use of an extra parameter (given by the a parameter for the 
Pareto distribution over A). 

Summary and Discussion 

In this work, I constructed a parametric wavelet dictionary comprising a 
set of Gabor functions, and learned the joint probability distribution of the 
Gabor parameters from data consisting of natural images. The dictionary 
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Figure 7: Scatter plots for a and A sampled from different models of the sta- 
tistical dependency in Fig. 5 (Bottom Panel). (Top) No dependency. (Mid- 
dle) Line-of-best-fit dependency. (Bottom) Gaussian-copula dependency. 



17 



3.5 



2.5 



a 1.5 

o 



0.5 



^V, 



**** * 



* Initial Dictionary 

Uncorrelated Dictionary 

□ LBF-Correlated Dictionary 

o Copula-Correlated Dictionary 

X Learned Dictionary 




0.5 1 1.5 2 

Mean sparseness norm 

Figure 8: Mean reconstruction error for a set of 300 test-patch reconstructions 
at different sparseness levels (see text). Results are shown from using the dic- 
tionary of initial values from Fig. 1 (Asterisks), the learned dictionary from 
Fig. 2 (Crosses), and three synthetic dictionaries (Diamonds, Squares, Cir- 
cles). The uncorrelated dictionary (Diamonds) is represented by the graphi- 
cal model in Fig. 6(a), and has no modeled dependencies. Correlated dictio- 
naries are represented by the graphical model in Fig. 6(b); and dependency 
between A and a is modeled using either a line of best fit (Squares), or a 
Gaussian copula (Circles). 



was constructed so that the set of Gabor functions uniformly tiled space. 
The learned Gabor parameter statistics resulted in a uniform tiling of ori- 
entation and phase, and a non-uniform tiling of receptive-field (windowing) 
size; spatial frequency; and aspect ratio. The joint probability distribution 
of learned Gabor parameters was shown to have a strong correlation between 
the receptive-field size and spatial frequency parameters. 

I estimated three simple parametric models of the learned joint proba- 
bility distribution, and constructed synthetic dictionaries from these models. 
A comparison of each synthetic dictionary on image-patch reconstruction 
performance showed that the two correlated dictionaries modeling the de- 
pendency between receptive-field size and spatial frequency outperformed 
the uncorrelated dictionary, and approached the performance of the learned 
dictionary. By virtue of having a parametric model for both the joint prob- 
ability distribution and the wavelet dictionary, these synthetic dictionaries 
are fully parametric; and depend only on a handful of parameters. The work 
presented here generalizes approaches based on uniform wavelet parameter 
sampling, to wavelet parameters that are sampled from non-uniform distri- 
butions learned from natural image data - thereby adapting wavelets to the 
statistics of natural images. It would be interesting to see a detailed com- 
parison between these two different approaches. 

Comparing Gabor parameter statistics would be interesting for different 
classes of images, such as images of man-made environments versus images of 
natural scenes. For images of man-made environments, histograms of the ori- 
entation and receptive-field-size parameters are expected to be quite different 
to those presented here. Another application could be to use synthetic dic- 
tionaries as initializers for non-parametric learning algorithms. This might 
speed up convergence, and allow different local maxima of the likelihood to 
be explored by systematically varying the parameter statistics. 

If dictionary statistics are found to be robust over different sparse coding 
schemes and different natural-image data sets, then the technique of learning 
the joint probability distribution of wavelet parameters from data may lead to 
further insight into efficient coding strategies, and to a better understanding 
of the function of simple cells in the primary visual cortex. 
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Appendix 

The partial derivatives required in Eqs. f llSp and f ll9p are given here: 



90(r) 
dg'^{r, r' 



da{r) 






9A(r) 
dg^{r, r' 



(7(r)^ - l)^r^g' (r, r') - 2n-—h' (r, r'), (25) 



„~,2 I ,,/'„\2~2 



A(r) 



r + 7(r)'x' \ T( ,. 
9 (r,r), 



air 



7(r)x2 ^ 
— r^9 r,r 
o- r r 



'^'h-iry), 



d(p{r) 



A(r)2 
-/i^(r,r') 



(26) 
(27) 
(28) 
(29) 



where g^{r,r') = g{r',r) is the transpose of g{r,r'), and h'^{r,r') = h{r',r) 
is the transpose of 

Mr, rO = exp ( "'',;ff''' ) sin (2vr-A_ + ^(r')) , (30) 

and {x,y) are defined as in Eq. ([5]). 
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